Goto

Collaborating Authors

 noisy distance sample


Learning Nearest Neighbor Graphs from Noisy Distance Samples

Neural Information Processing Systems

We consider the problem of learning the nearest neighbor graph of a dataset of n items. The metric is unknown, but we can query an oracle to obtain a noisy estimate of the distance between any pair of items. This framework applies to problem domains where one wants to learn people's preferences from responses commonly modeled as noisy distance judgments. In this paper, we propose an active algorithm to find the graph with high probability and analyze its query complexity. In contrast to existing work that forces Euclidean structure, our method is valid for general metrics, assuming only symmetry and the triangle inequality. Furthermore, we demonstrate efficiency of our method empirically and theoretically, needing only O(n\log(n)\Delta^{-2}) queries in favorable settings, where \Delta^{-2} accounts for the effect of noise. Using crowd-sourced data collected for a subset of the UT~Zappos50K dataset, we apply our algorithm to learn which shoes people believe are most similar and show that it beats both an active baseline and ordinal embedding.


Reviews: Learning Nearest Neighbor Graphs from Noisy Distance Samples

Neural Information Processing Systems

As a non theory person I was able to follow the motivation, the problem set-up and the main result. The intuition that not all queries Q(j,k) needs to be issued conditioned on the past queries due to the fact that we're in a metric space. The bounds looks fine, mostly it is a construction of the confidence in the estimation of Q(j,k), again, under the constraint that Q(j,k) is a metric space with metric-ish properties. As a piece of technical achievement this paper is just fine. But it can improve in the following sense of story-telling and organisation: 1) we're doing an optimal query problem, namely, querying a noisey oracle Q(j,k) to construct a nearest-neighbor graph G(x_i,x_j).


Reviews: Learning Nearest Neighbor Graphs from Noisy Distance Samples

Neural Information Processing Systems

This paper defines a new learning problem of learning 1-NN graph on a metric space consisting of n points using a noisy distance oracle. Ignoring the metric structure, the problem can be viewed n separate best arm identification problems (for each node find its nearest neighbor by calling the oracle). The best arm identification problem can solved by UCB-like methods that maintain confidence intervals for each distance and has been studied extensively in the literature. The paper proposes an improved algorithm that exploits the triangle inequality and symmetry of the underlying metric in order to improve the confidence intervals. As the paper shows that the improvement doesn't help much for the worst-case metric space.


Learning Nearest Neighbor Graphs from Noisy Distance Samples

Neural Information Processing Systems

We consider the problem of learning the nearest neighbor graph of a dataset of n items. The metric is unknown, but we can query an oracle to obtain a noisy estimate of the distance between any pair of items. This framework applies to problem domains where one wants to learn people's preferences from responses commonly modeled as noisy distance judgments. In this paper, we propose an active algorithm to find the graph with high probability and analyze its query complexity. In contrast to existing work that forces Euclidean structure, our method is valid for general metrics, assuming only symmetry and the triangle inequality.


Greedy $k$-Center from Noisy Distance Samples

arXiv.org Machine Learning

We study a variant of the canonical $k$-center problem over a set of vertices in a metric space, where the underlying distances are apriori unknown. Instead, we can query an oracle which provides noisy/incomplete estimates of the distance between any pair of vertices. We consider two oracle models: Dimension Sampling where each query to the oracle returns the distance between a pair of points in one dimension; and Noisy Distance Sampling where the oracle returns the true distance corrupted by noise. We propose active algorithms, based on ideas such as UCB and Thompson sampling developed in the closely related Multi-Armed Bandit problem, which adaptively decide which queries to send to the oracle and are able to solve the $k$-center problem within an approximation ratio of two with high probability. We analytically characterize instance-dependent query complexity of our algorithms and also demonstrate significant improvements over naive implementations via numerical evaluations on two real-world datasets (Tiny ImageNet and UT Zappos50K).


Learning Nearest Neighbor Graphs from Noisy Distance Samples

Neural Information Processing Systems

We consider the problem of learning the nearest neighbor graph of a dataset of n items. The metric is unknown, but we can query an oracle to obtain a noisy estimate of the distance between any pair of items. This framework applies to problem domains where one wants to learn people's preferences from responses commonly modeled as noisy distance judgments. In this paper, we propose an active algorithm to find the graph with high probability and analyze its query complexity. In contrast to existing work that forces Euclidean structure, our method is valid for general metrics, assuming only symmetry and the triangle inequality.